Infectious diseases are a very important public health issue. So we want to examine overall communicable disease rates and trends over time of Infectious diseases reported in california. Sexually transmitted diseases will be analized separately from other groups of infectious diseases.
*We created new groups of variables to facilitate data presentation and analysis. The new groups of variables are:
Name of california region, for the 10 different California regions
Type of infectious disease : to group each of thereported diseases by “type of disease” , following conventional microbiology classification.
The California regions are as follows: Superior <- “NEVADA”,“PLACER”,“PLUMAS”,“SACRAMENTO”,“SHASTA”,“SIERRA”, “SISKIYOU”,“SUTTER”,“TEHAMA”, “YOLO”, “YUBA”, “MODOC”, “EL DORADO”, “BUTTE”, “GLENN”, “LASSEN” North Coast <- “DEL NORTE”, “HUMBOLDT”, “LAKE”, “MENDOCINO”, “NAPA”,“SONOMA”, “TRINITY” Bay area<- “ALAMEDA”,“CONTRA COSTA”, “MARIN”, “SAN FRANCISCO”, “SAN MATEO”, “SANTA CLARA”, “SOLANO” North San Joaquin Valley <- “ALPINE”, “AMADOR”, “CALAVERAS”, “MADERA”, “MARIPOSA”, “MERCED”, “MONO”,“SAN JOAQUIN”, “STANISLAUS”, “TUOLUMNE” Central Coast <- “MONTEREY”, “SAN BENITO”, “SAN LUIS OBISPO”, “SANTA BARBARA”, “SANTA CRUZ”, “VENTURA” South San Joaquin Valley <- “FRESNO”,“INYO”, “KERN”, “KINGS”, “TULARE” Inland Empire<- “RIVERSIDE”, “SAN BERNARDINO” LA County <- “LOS ANGELES” Orange County <- “ORANGE” San Diego and Imperial County <- “IMPERIAL”, “SAN DIEGO” We will also have california as a total.
The groups of infectious diseases will be as follows: 1. Parasitic <- c(“Amebiasis”,“Babesiosis”, “Cryptosporidiosis”, “Cyclosporiasis”, “Cysticercosis or Taeniasis”, “Malaria”, “Giardiasis”, “Trichinosis”) 2. Toxin_related <- c(“Botulism, Foodborne”,“Botulism, Other”, “Botulism, Wound”, “Ciguatera Fish Poisoning”, “Domoic Acid Poisoning”,“Paralytic Shellfish Poisoning”, “Scombroid Fish Poisoning”) 3. viral <- c(“Chikungunya Virus Infection”, “Dengue Virus Infection”,“Flavivirus Infection of Undetermined Species”,“Hantavirus Infection”,“Hepatitis E acute infection”,“Rabies, human”,“Yellow Fever”, “Zika Virus Infection”) prions <- c(“Creutzfeldt-Jakob Disease and other Transmissible Spongiform Encephalopathies”) 4. fungal <- c(“Coccidioidomycosis”) 5. Bacterial <- c(“Anaplasmosis”, “Anaplasmosis and Ehrlichiosis”, “Anthrax”, “Brucellosis”, “Campylobacteriosis”,“Cholera”,“E. coli O157”,“E. coli Other STEC (non-O157)”, “Legionellosis”,“Leprosy (Hansen’s Disease)”, “Leptospirosis”, “Listeriosis”, “Lyme Disease”,“Plague, human”,“Q Fever”,“Spotted Fever Rickettsiosis”, “Streptococcal Infection (cases in food and dairy workers)”, “Ehrlichiosis”, “Psittacosis”, “Salmonellosis”, “Shigellosis”, “Tularemia”, “Typhoid Fever”, “Paratyphoid Fever”, “Typhus Fever”, “Relapsing Fever”, “Shiga toxin-producing E. coli (STEC) without Hemolytic Uremic Syndrome (HUS)”, “Vibrio Infection (non-Cholera)”, “Shiga Toxin Positive Feces (without culture confirmation)”,“Yersiniosis”) 6. Infectious_complications <- c(“Hemolytic Uremic Syndrome (HUS) without evidence of Shiga toxin-producing E. coli (STEC)”,“Hemolytic Uremic Syndrome (HUS)”, “Shiga toxin-producing E. coli (STEC) with Hemolytic Uremic Syndrome (HUS)”)
my_table_data <- ID_tableyears_group_total %>%
select(c("ID_type","region","rate","time_period")) %>%
filter(ID_type=="Bacterial"|ID_type== "Parasitic"|ID_type=="Fungal"|ID_type=="Viral") %>%
filter(region=="California")%>%
drop_na(rate) %>%
group_by(ID_type,time_period,region) %>%
summarise(cumm_rate = sum (rate))
## `summarise()` regrouping output by 'ID_type', 'time_period' (override with `.groups` argument)
my_new_table_data <- my_table_data %>%
pivot_wider(names_from=c(ID_type),values_from= "cumm_rate")
kable(my_new_table_data,
booktabs=T,
col.names=c("Time Period", " ","Bacterial", "Fungal", "Parasitic", "Viral"),
align='lccc',
caption="Infectious disease rates (Cases/100,000) over time by disease etiology (from 2001 - 2018 by 3 year increments)",
format.args=list(big.mark=","))
| Time Period | Bacterial | Fungal | Parasitic | Viral | |
|---|---|---|---|---|---|
| 2001-2003 | California | 109.46 | 14.69 | 30.03 | 0.07 |
| 2004-2006 | California | 101.86 | 23.37 | 26.38 | NA |
| 2007-2009 | California | 102.86 | 21.00 | 24.11 | 0.10 |
| 2010-2012 | California | 108.82 | 36.58 | 20.97 | 0.55 |
| 2013-2015 | California | 124.43 | 22.77 | 21.15 | 1.04 |
| 2016-2018 | California | 142.98 | 52.33 | 27.80 | 1.76 |
##Mai
my_bayarea_table <- ID_tableyears_group_total %>%
select(c("ID_type","region","rate","time_period")) %>%
filter(ID_type=="Bacterial"|ID_type== "Parasitic"|ID_type=="Fungal"|ID_type=="Viral") %>%
filter(region=="Bay_area") %>%
drop_na(rate) %>%
#group_by(ID_type, time_period)%>%
group_by(ID_type,time_period,region) %>%
summarise(cumm_rate = sum (rate))
## `summarise()` regrouping output by 'ID_type', 'time_period' (override with `.groups` argument)
my_new_bayarea_table <- my_bayarea_table %>%
pivot_wider(names_from=c(ID_type),values_from= "cumm_rate")
# Table for bay area only :
kable(my_new_bayarea_table,
booktabs=T,
col.names=c("Time_Period", " ", "Bacterial", "Fungal", "Parasitic", "Viral"),
align='lccc',
caption="Infectious disease rates over time in the Bay Area from 2001-2018 by etiology of
disease and time period (3 year cummulatives)",
format.args=list(big.mark=","))
| Time_Period | Bacterial | Fungal | Parasitic | Viral | |
|---|---|---|---|---|---|
| 2001-2003 | Bay_area | 945.44 | NA | 342.40 | NA |
| 2004-2006 | Bay_area | 947.73 | 10.53 | 306.70 | NA |
| 2007-2009 | Bay_area | 932.71 | 9.76 | 254.58 | NA |
| 2010-2012 | Bay_area | 992.36 | 14.77 | 250.65 | NA |
| 2013-2015 | Bay_area | 1,136.90 | 20.89 | 228.59 | NA |
| 2016-2018 | Bay_area | 1,300.47 | 42.55 | 343.04 | 3.47 |
#Figures and codes:
# Creating table for figure 1 take 2:
ID_tableyears_group_total_fig1Ltake2 <- ID_table3californiaonly %>%
select(c("ID_type","rate_total","time_period")) %>%
# time_period=as.character(time_period)%>%
drop_na(rate_total) %>%
group_by(time_period, ID_type)%>%
mutate(average_rate=sum(rate_total)/3)%>%
distinct(average_rate, .keep_all = TRUE)
ggplot(ID_tableyears_group_total_fig1Ltake2, aes(x = ID_type, y = average_rate)) +
geom_bar(aes(fill=time_period), stat="identity", position = position_dodge()) +
#geom_col(aes(fill=), col)
scale_y_continuous(labels = function(x) format(x,bigmark=",",scientific=FALSE))+
#scale_fill_manual(name= "time_period") +
scale_fill_discrete(name= "time_period")+
#values=c("#ffd333","#ff6600","#be0f24","#f91cc7","#910ff5","#003884")) +
labs(x="Group of Infectious disease", y = "Average 3 year rate",
title = "Figure 1: Trend of infectious diseases over time from 2001-2018 by type of
Disease and time period (3 year averages)")
Figure 1 shows that of the reported infectious diseases (excluding sexually transmitted diseases) that are most commonly reported are Bacterial diseases, followed by Fungal, and then parasitic diseases. Viral diseases have a lower rate. These numbers do not necessarily translates into real prevalence since many diseases are not considered “reportable”, due to their common prevalence and ubiquitous distribution. Ingeneral thorugh the years the frequency of reported bacterial, Fungal and viral diseases have increased, while Parasitic have decreased, except for 2016-2018 that shows an increasing trend.
#take 2 figure 2
#Figure 2L Trends from 2001-2018 only for Bacterial, Fungal and Parasitic conditions
ID_tableyears_group_total_fig2Ltake2 <- IDtable3 %>%
select(c("ID_type","rate","year","cases",)) %>%
mutate(year=as.character(year)) %>%
# filter(region=="California")%>%
filter(ID_type=="Bacterial"|ID_type== "Parasitic"|ID_type=="Fungal"|ID_type=="Viral") %>%
#drop_na(rate) %>%
group_by(ID_type, year)%>%
summarize(sum_cases=sum(cases))%>%
cbind(populationv)%>%
rename(pop_total="...4")%>%
mutate(rate_average= (sum_cases/pop_total)*100000)
## `summarise()` regrouping output by 'ID_type' (override with `.groups` argument)
## New names:
## * NA -> ...4
#using plotly
plot_ly(
ID_tableyears_group_total_fig2Ltake2,
x= ~year,
y= ~rate_average,
color= ~ID_type,
type="bar"
) %>%
layout(barmode="stack")%>%
layout(
title = "Figure 2 : Trends of Bacterial Fungal and Parasitic Diseases
rates per 100,000 from 2001-2018 in California
(Excludes STD's)",
xaxis = list(title = "Years"),
yaxis = list(title = "California Rates per 100,000")
)
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
Figure 2: We could tell thet through the years, reports of bacterial diseases have increased overtime. Reasons for this increase could be related to a real increase of reportable cases, versus improved report methodology. The same goes to Fungal infections. Prasitic infections have decreased, except for 2016-2018 that show an increase. Viral infection reports have increased since 2016.
#Take 2 figure 3 #Lourdes most common bacterial diseases from 2001 to 2018:
ID_mostcommon_bacterial_fig3Ltake2 <- IDtable3 %>%
select(c("disease","rate","year","cases","population","sex","ID_type")) %>%
mutate(year=as.character(year)) %>%
filter(ID_type=="Bacterial") %>%
drop_na(rate) %>%
group_by(disease) %>%
mutate(total_cases_average = sum(cases)/18) %>%
distinct(total_cases_average, .keepall=TRUE)%>%
filter(total_cases_average>120)
#%>%
#summarize(cum_disease = sum(disease)) %>%
#mutate(average_cases= (cum_disease/18))
#using plotly
plot_ly(
ID_mostcommon_bacterial_fig3Ltake2,
x= ~disease,
y= ~total_cases_average,
color= ~disease,
type="bar") %>%
# layout(barmode="stack")%>%
layout(
title = "Figure 3 :Most common reported Bacterial Infections
average number of cases per year from 2001-2018 in California
(Excludes STD's)",
xaxis = list(title = "Bacterial Infections"),
yaxis = list(title = "Average number of cases per year")
)
Figure 3 : Among the bacterial infections, the most commonly reported one is Campilobacteriosis, followed by Salmonellosis and Shiguellosis.
#Take 2 Figure 4 will create a new bar chart for most common reported Parasitic diseases rates in California
ID_mostcommon_parasitic_fig4Ltake2 <- IDtable3 %>%
select(c("disease","rate","year","cases","population","sex","ID_type")) %>%
mutate(year=as.character(year)) %>%
filter(ID_type=="Parasitic") %>%
drop_na(rate) %>%
group_by(disease) %>%
mutate(total_cases_average = sum(cases)/18) %>%
distinct(total_cases_average, .keepall=TRUE)%>%
filter(total_cases_average>14)
#using plotly
plot_ly(
ID_mostcommon_parasitic_fig4Ltake2,
x= ~disease,
y= ~total_cases_average,
color= ~disease,
type="bar") %>%
# layout(barmode="stack")%>%
layout(
title = "Figure 4 :Most common reported Parasitic Diseases
yearly average 2001-2018 in California",
xaxis = list(title = "Parasitic Infections"),
yaxis = list(title = "Average number of cases per year")
)
Figure 4 ; Among parasitic infections, Giardiasis is the most common, followed by Amebiasis and cryptosporidiosis.
#Take 2 Figure 5 will create a new bar chart for most common reported fungal diseases rates in California
ID_mostcommon_fungal_fig5Ltake2 <- IDtable3 %>%
select(c("disease","rate","year","cases","population","sex","ID_type")) %>%
mutate(year=as.character(year)) %>%
filter(ID_type=="Fungal") %>%
drop_na(rate) %>%
group_by(disease) %>%
mutate(total_cases_average = sum(cases)/18) %>%
distinct(total_cases_average, .keepall=TRUE)
#using plotly
plot_ly(
ID_mostcommon_fungal_fig5Ltake2,
x= ~disease,
y= ~total_cases_average,
color= ~disease,
type="bar")%>%
# layout(barmode="stack")%>%
layout(
title = "Figure 5 : Most common reported Fungal Infections
rates per 100,000 from 2001-2018 in California
(Excludes STD's)",
xaxis = list(title = "Fungal infections"),
yaxis = list(title = "average numeber of cases per year")
)
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
Figure 5 ; Among Fungal infections, the only one reported in the last 18 years was Coccidiomicosis
#take 2 Lourdes; will create a new bar chart for most common reported viral diseases in California
ID_mostcommon_viral_fig6Ltake2 <- IDtable3 %>%
select(c("disease","rate","year","cases","population","sex","ID_type")) %>%
mutate(year=as.character(year)) %>%
filter(ID_type=="Viral") %>%
drop_na(rate) %>%
group_by(disease) %>%
mutate(total_cases = sum(cases)) %>%
distinct(total_cases, .keepall=TRUE)
#filter(total_cases>14)
#using plotly
plot_ly(
ID_mostcommon_viral_fig6Ltake2,
x= ~disease,
y= ~total_cases,
color= ~disease,
type="bar")%>%
# layout(barmode="stack")%>%
layout(
title = "Figure 6 :Most common reported Viral Infections
total cases from 2001-2018 in California
(Excludes STD's)",
xaxis = list(title = "Viral infections"),
yaxis = list(title = "Total cases")
)
Figure 6 : Among viral infections, the most commonly reported was Dengue virus infection. The newly described virus Chikungunya and Zika virus were not reported in California until 2017
#sandya-create disease trend over time
individualdata1<- read_csv("stds-by-disease-county-year-sex.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## Disease = col_character(),
## County = col_character(),
## Year = col_double(),
## Sex = col_character(),
## Cases = col_double(),
## Population = col_double(),
## Rate = col_double(),
## `Lower 95% CI` = col_double(),
## `Upper 95% CI` = col_double(),
## `Annotation Code` = col_character()
## )
groupdata<-read_csv("idb_odp_2001-2018 (1) (1).csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## Disease = col_character(),
## County = col_character(),
## Year = col_double(),
## Sex = col_character(),
## Cases = col_double(),
## Population = col_double(),
## `Lower 95% CI` = col_double(),
## `Upper 95% CI` = col_double(),
## Rate = col_character()
## )
groupdatafinal<-filter(groupdata, County %in% c("ALAMEDA", "SANTA CLARA", "SAN MATEO", "SAN FRANCISCO", "MARIN", "CONTRA COSTA", "SOLANO"))
individualdatafinal<-filter(individualdata1, County %in% c("Alameda", "Santa Clara", "San Mateo", "San Francisco", "Marin", "Contra Costa", "Solano"))%>% select(-10)
combineddata<-rbind(groupdatafinal, individualdatafinal)
combineddata$County<-tolower(combineddata$County)
combineddata$Sex<-tolower(combineddata$Sex)
combineddata1<-combineddata%>%filter(Sex %in% "total")
combinedatafinal<-combineddata1%>%mutate(Disease_Type = case_when(Disease %in% c("Gonorrhea", "Early Syphilis", "Chlamydia")~ "STD (Bacterial)", Disease %in% c("Shiga toxin-producing E. coli (STEC) with Hemolytic Uremic Syndrome (HUS)", "Shiga toxin-producing E. coli (STEC) without Hemolytic Uremic Syndrome (HUS)", "Anaplasmosis and Ehrlichiosis", "Hemolytic Uremic Syndrome (HUS) without evidence of Shiga toxin-producing E. coli (STEC)", "Paratyphoid Fever", "Ehrlichiosis", "Anaplasmosis" ,"Shiga Toxin Positive Feces (without culture confirmation)", "E. coli Other STEC (non-O157)", "Yersiniosis" ,"Vibrio Infection (non-Cholera)" , "Typhoid Fever", "Typhus Fever", "Tularemia" ,"Streptococcal Infection (cases in food and dairy workers)" ,"Spotted Fever Rickettsiosis" , "Shigellosis","Salmonellosis", "Relapsing Fever" , "Q Fever" , "Psittacosis" , "Plague, human","Lyme Disease", "Listeriosis" ,"Leptospirosis","Leprosy (Hansen's Disease)" ,"Legionellosis","E. coli O157" ,"Cholera" , "Campylobacteriosis" ,"Brucellosis" , "Anthrax" )~ "Bacteria", Disease %in% c("Chikungunya Virus Infection", "Dengue Virus Infection","Flavivirus Infection of Undetermined Species", "Hantavirus Infection","Hepatitis E, acute infection", "acute infection", "Rabies, human","Yellow Fever", "Zika Virus Infection")~ "Virus", Disease %in% c("Amebiasis","Babesiosis", "Cryptosporidiosis", "Cyclosporiasis", "Cysticercosis or Taeniasis", "Malaria", "Giardiasis", "Trichinosis")~"Protozoa", Disease %in% c( 'Botulism, Foodborne', 'Botulism, Other', 'Botulism, Wound', 'Ciguatera Fish Poisoning', 'Domoic Acid Poisoning', 'Paralytic Shellfish Poisoning', 'Scombroid Fish Poisoning')~ "Toxin", Disease %in% c('Creutzfeldt-Jakob Disease and other Transmissible Spongiform Encephalopathies')~ "Prion", Disease == 'Hemolytic Uremic Syndrome (HUS)'~ "Infectious Complication", Disease == "Coccidioidomycosis"~ "Fungal"))
data1<-combinedatafinal%>%select(-9)
data1$Rate<-(data1$Cases/data1$Population)*100000
plotdatatest<-data1%>%group_by(Disease_Type, Year)%>%summarize(Sum_cases=sum(Cases))
## `summarise()` regrouping output by 'Disease_Type' (override with `.groups` argument)
testin<-data1%>%group_by(County, Year)%>%summarize(total_pop=mean(Population))%>%group_by(Year)%>%summarise(totalp=sum(total_pop))
## `summarise()` regrouping output by 'County' (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
hope<-left_join(testin, plotdatatest, by= "Year")
hope$Overall_Rate<-(hope$Sum_cases/hope$totalp)*100000
ggplot(hope, aes(x=Year, y=Overall_Rate))+facet_wrap(vars(Disease_Type), ncol = 2)+geom_line(aes(color = Disease_Type))+labs(x="Year", y="Overall Rate (per 100K)", title = "Overall Rates of Infectious diseases (including STDs) in Bay Area Counties from 2001-2018") + theme_minimal()
Interpretation of graph: This graph looks at the overall rates per year of different types of infectious disease in the Bay area counties from 2001 to 2018. From the graphs it is noticeable that STD rates are increasing at a much greater level than other types of infectious diseases.
ggplot(hope, aes(x=Year, y=Overall_Rate))+facet_wrap(vars(Disease_Type), ncol = 2, scales = "free_y")+geom_line(aes(color = Disease_Type))+labs(x="Year", y="Overall Rate (per 100K)", title = "Overall Rates of Infectious diseases (including STDs) in Bay Area Counties from 2001-2018") + theme_minimal()
Interpretation of graph: This graph looks at the overall rates per year of different types of infectious disease from 2001 to 2018 in the Bay area counties (excluding STDs). I created this graph to better visualize the trends in diseases apart from STDs. From the graphs it is noticeable that fungla rates are increasing overtime whereas Toxins, Priosns, Infectious complications remain at a very low steady level. Bacterial infection rates remain higher than the other types of disease but seem to be at a steady rate over time.
combineddatagender<-individualdatafinal%>%filter(Sex %in% c("Male", "Female"))%>%select(-9)%>%group_by(Sex, Year)%>%summarise(case_total=sum(Cases))
## `summarise()` regrouping output by 'Sex' (override with `.groups` argument)
std_set<-left_join(testin, combineddatagender, by= "Year")
std_set$Overall_Rate<-(std_set$case_total/std_set$totalp)*100000
ggplot(std_set, aes(x=Year, y=Overall_Rate))+geom_line(aes(color = Sex))+labs(x="Year", y="Overall Rate", title = "Overall rates per year of STDs (Bacterial) in Bay Area Counties from 2001-2018 seperated by Sex") +theme_minimal()
Interpretation of graph: This graph looks at the Overall rates per year of bacterial STD infectious disease per year in the Bay area counties from 2001 to 2018, seperated by Sex. I created this graph to better visualize the trends in STDs between males and females.The graphs shows a very significant increase in the overall rate of STDs for both male and females. Prior to around 2014, it seems that female rates were higher than male rates. However from around 2014 and onward, we see an even greater increase in male rates.
my_final_try <- 1